In this project I seek to visualize the GDP in countries versus population density across the world using an interactive plot, spatial model, and linear model. The goal with these is to put into scope the reality of the massive difference between countries thriving and those who are not. Using the three models we can observe the scope of this reality in several different perspectives which may help people comprehend the stark contrast between major powers in our world. The overall story of these graphics shows the USA at the leading edge but there are several nations coming very close that I wasn’t expecting such as Saudi Arabia and Australia with places like India having a lower GDP but high population density.
For this analysis and summary I chose the data set titled CIACountries which show’s statistics between countries such as oil production, population, GDP, etc. This code initializes the data set and the tidyverse library for use for later. All other libraries will be called when needed. Picking the appropriate data from CIACountries was a challenge as I need to tell a story and with this project we were told the general type of graphics to use. That being said I chose GDP versus population and decided to stick with those data values and explore the different models and how each could represent the data.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
CIA <- read_csv("https://raw.githubusercontent.com/aalhamadani/datasets/refs/heads/main/CIACountries.csv")
## Rows: 236 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, net_users
## dbl (6): pop, area, oil_prod, gdp, educ, roadways
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
After choosing the first set of data I wanted to use I had to pick on for the graphic types named to use. The interactive plot was the easiest choice as a scatter plot with an interactive element is a straightforward way to communicate data which is easily digestable. Once done I realized I could effectively up the complexity of the data representation with each model type which will be discussed in the next section. For this graphic I focused on displaying both GDP and Population while also allowing the user to hover over dots to see the country statistics in full from the .csv file. This repesents all data while putting it into what I want to focus on.
library(plotly);
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
p <- ggplot(CIA, aes(x = pop, y = gdp, text = paste("Country:", country, "<br>Population:", pop, "<br>GDP:", gdp, "<br>Area:", area, "<br>Oil Production:", oil_prod, "<br>Internet Users:", net_users))) +
geom_point(aes(size = area, color = oil_prod), alpha = 0.7) +
scale_x_log10(labels = scales::comma) +
scale_y_log10(labels = scales::dollar) +
labs(title = "GDP vs. Population by Country", x = "Population (log scale)", y = "GDP (log scale)",color = "Oil Production", size = "Area (sq km)") +
theme_minimal();
ggplotly(p, tooltip = "text")
For this graphic I put together a map that has a clear dark outline for each country which allows for the hues in the palette of RdYlBu to show nicely. In this case the lighter the country is the better GDP they have and the more in the red they are the worse off the GDP is. When originally putting together this graphic the USA was white which meant no data available for the country. After some throubleshooting and research I was able to get the proper tone to display, however, the same issue is happening for several African nations and I could not get them to appear. Regardless, the graphic does show a visual distinctness with many north-western countries being well off and many eastern and southern countries being in a rough spot with GDP.
library(leaflet)
library(sf)
## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
library(rnaturalearth)
CIA$country[CIA$country == "United States"] <- "United States of America"
world <- ne_countries(scale = "medium", returnclass = "sf")
world_cia <- merge(world, CIA, by.x = "name", by.y = "country", all.x = TRUE)
pal <- colorNumeric("RdYlBu", domain = world_cia$gdp, na.color = "transparent")
leaflet(world_cia) %>%
addTiles() %>%
addPolygons(fillColor = ~pal(gdp), color = "black", weight = 1, fillOpacity = 0.8) %>%
addLegend(position = "bottomright", pal = pal, values = ~gdp, title = "GDP per Capita", opacity = 0.8)
Finally we have the linear fit model which in this case is very warped by several of the larger population nations. Even after eliminating all of the major outliers the fit does not change much. This median line, in this case, makes sense as there are many more lower GDP/Population countries than big ones. What this graphic aims to show is that the average country on earth has a pretty low GDP when compared to some of the western countries and population, while most certainly having an impact, can be a hindrance depending on how that population is functioning.
# Simple linear regression: GDP vs Population
model <- lm(gdp ~ pop, data = CIA)
ggplot(CIA, aes(x = pop, y = gdp)) +
geom_point(alpha = 0.7) +
scale_x_continuous() +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(title = "Relationship between Population and GDP", x = "Population", y = "GDP")+
coord_cartesian(xlim = c(0, 4e8))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 8 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).
Exploring the CIACountries data set showed that population and GDP have a pretty close correlation and that for some countries, population bloat is definitely a drain on GDP, we can see this in China and India for example. Other countries have less population and less GDP which makes sense and some like the USA have a sizable population and high GDP. Geographically this could be linked to raw material exports or industry but regardless it’s sobering to see as here in the USA we tend to take our status for granted as we are so used to it. Furthermore, the specific data visualizations push this narrative well by displaying data in increasingly more complicated models which shows the reader progression and insight into the data as well as the narrative.